56 research outputs found
Optimal Testing for Planted Satisfiability Problems
We study the problem of detecting planted solutions in a random
satisfiability formula. Adopting the formalism of hypothesis testing in
statistical analysis, we describe the minimax optimal rates of detection. Our
analysis relies on the study of the number of satisfying assignments, for which
we prove new results. We also address algorithmic issues, and give a
computationally efficient test with optimal statistical performance. This
result is compared to an average-case hypothesis on the hardness of refuting
satisfiability of random formulas
Optimal detection of sparse principal components in high dimension
We perform a finite sample analysis of the detection levels for sparse
principal components of a high-dimensional covariance matrix. Our minimax
optimal test is based on a sparse eigenvalue statistic. Alas, computing this
test is known to be NP-complete in general, and we describe a computationally
efficient alternative test using convex relaxations. Our relaxation is also
proved to detect sparse principal components at near optimal detection levels,
and it performs well on simulated datasets. Moreover, using polynomial time
reductions from theoretical computer science, we bring significant evidence
that our results cannot be improved, thus revealing an inherent trade off
between statistical and computational performance.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Average-case Hardness of RIP Certification
The restricted isometry property (RIP) for design matrices gives guarantees
for optimal recovery in sparse linear models. It is of high interest in
compressed sensing and statistical learning. This property is particularly
important for computationally efficient recovery methods. As a consequence,
even though it is in general NP-hard to check that RIP holds, there have been
substantial efforts to find tractable proxies for it. These would allow the
construction of RIP matrices and the polynomial-time verification of RIP given
an arbitrary matrix. We consider the framework of average-case certifiers, that
never wrongly declare that a matrix is RIP, while being often correct for
random instances. While there are such functions which are tractable in a
suboptimal parameter regime, we show that this is a computationally hard task
in any better regime. Our results are based on a new, weaker assumption on the
problem of detecting dense subgraphs
Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes
Optimal transport is an important tool in machine learning, allowing to
capture geometric properties of the data through a linear program on transport
polytopes. We present a single-loop optimization algorithm for minimizing
general convex objectives on these domains, utilizing the principles of
Sinkhorn matrix scaling and mirror descent. The proposed algorithm is robust to
noise, and can be used in an online setting. We provide theoretical guarantees
for convex objectives and experimental results showcasing it effectiveness on
both synthetic and real-world data.Comment: ICML 202
Resource Allocation for Statistical Estimation
Statistical estimation in many contemporary settings involves the
acquisition, analysis, and aggregation of datasets from multiple sources, which
can have significant differences in character and in value. Due to these
variations, the effectiveness of employing a given resource (e.g., a sensing
device or computing power) for gathering or processing data from a particular
source depends on the nature of that source. As a result, the appropriate
division and assignment of a collection of resources to a set of data sources
can substantially impact the overall performance of an inferential strategy. In
this expository article, we adopt a general view of the notion of a resource
and its effect on the quality of a data source, and we describe a framework for
the allocation of a given set of resources to a collection of sources in order
to optimize a specified metric of statistical efficiency. We discuss several
stylized examples involving inferential tasks such as parameter estimation and
hypothesis testing based on heterogeneous data sources, in which optimal
allocations can be computed either in closed form or via efficient numerical
procedures based on convex optimization
- …